Sample and Computationally Efficient Learning Algorithms under S-Concave Distributions
نویسندگان
چکیده
We provide new results for noise-tolerant and sample-efficient learning algorithms under s-concavedistributions. The new class of s-concave distributions is a broad and natural generalization of log-concavity, and includes many important additional distributions, e.g., the Pareto distribution and t-distribution. This class has been studied in the context of efficient sampling, integration, and optimiza-tion, but much remains unknown about the geometry of this class of distributions and their applications inthe context of learning. The challenge is that unlike the commonly used distributions in learning (uniformor more generally log-concave distributions), this broader class is not closed under the marginalizationoperator and many such distributions are fat-tailed. In this work, we introduce new convex geome-try tools to study the properties of s-concave distributions and use these properties to provide boundson quantities of interest to learning including the probability of disagreement between two halfspaces,disagreement outside a band, and the disagreement coefficient. We use these results to significantly gen-eralize prior results for margin-based active learning, disagreement-based active learning, and passivelearning of intersections of halfspaces. Our analysis of geometric properties of s-concave distributionsmight be of independent interest to optimization more broadly.
منابع مشابه
Active and passive learning of linear separators under log-concave distributions
We provide new results concerning label efficient, polynomial time, passive and active learning of linear separators. We prove that active learning provides an exponential improvement over PAC (passive) learning of homogeneous linear separators under nearly log-concave distributions. Building on this, we provide a computationally efficient PAC algorithm with optimal (up to a constant factor) sa...
متن کاملA The Power of Localization for Efficiently Learning Linear Separators with Noise
We introduce a new approach for designing computationally efficient learning algorithms that are tolerant to noise, and demonstrate its effectiveness by designing algorithms with improved noise tolerance guarantees for learning linear separators. We consider both the malicious noise model of Valiant [Valiant 1985; Kearns and Li 1988] and the adversarial label noise model of Kearns, Schapire, an...
متن کاملFourier-Based Testing for Families of Distributions
We study the general problem of testing whether an unknown discrete distribution belongs to a given family of distributions. More specifically, given a class of distributions P and sample access to an unknown distribution P, we want to distinguish (with high probability) between the case that P ∈ P and the case that P is ǫ-far, in total variation distance, from every distribution in P . This is...
متن کاملEfficient Robust Proper Learning of Log-concave Distributions
We study the robust proper learning of univariate log-concave distributions (over continuous and discrete domains). Given a set of samples drawn from an unknown target distribution, we want to compute a log-concave hypothesis distribution that is as close as possible to the target, in total variation distance. In this work, we give the first computationally efficient algorithm for this learning...
متن کاملLearning mixtures of structured distributions over discrete domains
Let C be a class of probability distributions over the discrete domain [n] = {1, . . . , n}. We show that if C satisfies a rather general condition – essentially, that each distribution in C can be well-approximated by a variable-width histogram with few bins – then there is a highly efficient (both in terms of running time and sample complexity) algorithm that can learn any mixture of k unknow...
متن کامل